Automatic text summarization tools help users in biomedical domain to acquiretheir intended information from various textual resources more efficiently.Some of the biomedical text summarization systems put the basis of theirsentence selection approach on the frequency of concepts extracted from theinput text. However, it seems that exploring other measures rather than thefrequency for identifying the valuable content of the input document, andconsidering the correlations existing between concepts may be more useful forthis type of summarization. In this paper, we describe a Bayesian summarizerfor biomedical text documents. The Bayesian summarizer initially maps the inputtext to the Unified Medical Language System (UMLS) concepts, then it selectsthe important ones to be used as classification features. We introducedifferent feature selection approaches to identify the most important conceptsof the text and to select the most informative content according to thedistribution of these concepts. We show that with the use of an appropriatefeature selection approach, the Bayesian biomedical summarizer can improve theperformance of summarization. We perform extensive evaluations on a corpus ofscientific papers in biomedical domain. The results show that the Bayesiansummarizer outperforms the biomedical summarizers that rely on the frequency ofconcepts, the domain-independent and baseline methods based on theRecall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. Moreover,the results suggest that using the meaningfulness measure and considering thecorrelations of concepts in the feature selection step lead to a significantincrease in the performance of summarization.
展开▼